Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048

Merged
merged 1 commit into from
Jun 1, 2019

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented May 22, 2019

Description

New Ops: ravel_multi_index, unravel_index

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-1408], where [MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048 1408 refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label May 23, 2019
@access2rohit
Copy link
Contributor Author

@apeforest : Please review

@@ -279,6 +279,19 @@ def test_diag():
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def test_ravel_and_unravel():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break it into two tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more readable and understandable this way. Also I have split it into 2 functions inside.


def test_unravel_index():
original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y))
assert (original_2d_idxs.asnumpy() == np.array(idxs_2d)).all() == True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove == True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@access2rohit access2rohit force-pushed the master branch 2 times, most recently from 7c915e3 to e2970e9 Compare May 23, 2019 23:48
@@ -279,6 +279,19 @@ def test_diag():
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def test_ravel_and_unravel():
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]
Copy link
Contributor

@marcoabreu marcoabreu May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use C++ tests with mocking of the memory allocation instead of actually allocating memory.

Although this is nightly, I don't think it's necessary for these kind of tests to actually allocate real memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add C++ unit test to mock the memory allocation. However, I thought the nightly test is used for end-to-end testing purpose to mimic the real deployment situation, or is it not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are to some extent, but in same cases the cost vs return ratio is not that good. While smoke tests are certainly necessary in some cases, resource intensive ones on the other hand can usually be abstracted away. If we assume that memory access on the system side is properly working and only mock this very laster layer, this result should be pretty much as meaningful as running it end to end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu : I didn't understand your last comment. What did you mean by "very laster layer" ? All these tests are single operator tests. This tests I combined because its easier to test when combined together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marcoabreu, thanks a lot for your suggestion and it's a very good learning for me too. The large tensor support is crucial to DGL (Deep Graph Library) that is using MXNet and planning to release with support of large tensors. Without this test, we cannot make this feature production ready.

While I appreciate the direction of testing you suggested and definitely would like to explore it further, I am also afraid that not adding tests before we implement the approach you suggested will block the release of DGL and other MXNet projects that depend on large tensors as well.

Can we have a seperate project to address the test efficiency (we may need some expertise from CI team for this project) while moving on with the large tensor project? Please let me know if you have other concerns or advice. Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu : We have to test cases where we have to allocate large tensors on actual physical memory. That is the whole purpose of adding these tests. Without which we cannot support or even guarantee large dense tensor support on MXNet.

The purpose of these tests are not at all to test only Sparse. Currently we are not addressing performance issues with large dense arrays but trying to support them. From your reply it seems like you are under the impression that everything should work fine if tensor size >2^32 elements(Please correct me if I am wrong). But that's not the case and the operator simply segfaults or gives unpredictable behavior. That is why is necessary to test them with such large arrays/tensors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu : Let me know if you have other unanswered queries that will help you understand the use case better and unblock this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,

I understand the importance of large tensors and I'm really looking forward to them as well. Don't get me wrong, I'm not underestimating the work that comes with supporting them.

We already had a history of issues that were caused by too large allocations. Thus, for the sake of keeping CI stable (independent on whether we're running on per-PR or nightly) and to adhere to a high test standard, I'd prefer if we would not delay adressing the test strategy.

I think there's a misunderstanding. With my proposed approach, you would still be able to make a full test of a single operator. But instead of randomly generating the input tensor, you would use a generator model that gives you predictable and procedually generatable input tensor instead of a large and randomly generated one.

The idea here would then be to give this procedually generated input tensor to the operator and since it's a predicutable input, you can also have a predictable output that you can verify. On the way, the operator will run as usual, it's just that the source of the data is not physical memory but generated values.

The only case that wouldn't be supported is the chaining of operators since that would require storing the intermediary values.

The idea here is that (in whatever fashion you'd like) we generate fake input and output tensors that return predictable and valid data, but doesn't actually consume memory on the RAM. From MXNets perspective, it shouldn't matter where the real values is coming from. For the sake of an operator, it's important that it actually retrieves proper values and that would be the case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu I still don't understand how will I know that on actual large tensor after memory allocation would it work as expected or not. Procedurally generating the values will indeed give me random values on random locations that are indexed at locations > 2^32. But the input should come from RAM. it does matter. It needs to be tested on actual physical memory so there are no surprises. If nighly tests are not feasible do we have a weekly test where we can run this ?

@@ -279,6 +279,19 @@ def test_diag():
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def test_ravel_and_unravel():
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space between operators and operands

idx = mx.nd.array(1)
def test_ravel_multi_index():
idx = mx.nd.ravel_multi_index(mx.nd.array(idxs_2d, dtype=np.int64), shape=(LARGE_X,SMALL_Y))
idx_numpy = np.ravel_multi_index(idxs_2d, (LARGE_X,SMALL_Y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space. btw, do you see any pylint warning out of these?

assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3

def test_unravel_index():
original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same: add space

@@ -279,6 +279,19 @@ def test_diag():
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def test_ravel_and_unravel():
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you mean indices_2d?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes . Though make pylint gives this output.

ubuntu@ip-172-31-82-110 ~/mxnet3 (master) $ make pylint
Makefile:345: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages
python3 -m pylint --rcfile=/home/ubuntu/mxnet3/ci/other/pylintrc --ignore-patterns="..so$,..dll$,..dylib$" python/mxnet tools/caffe_converter/.py
Using config file /home/ubuntu/mxnet3/ci/other/pylintrc


Your code has been rated at 10.00/10

I will still fix it. Dunno why pylint isn't catching it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it .... mxnet makefile doesn't do pylint on tests/nightly and tests/python/unittest ... added manually to local repo. Thanks for the advice !

import numpy as np
import mxnet as mx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested reordering of imports by pylint

@@ -130,13 +130,6 @@ def test_clip():
assert np.sum(res[-1].asnumpy() == 1000) == a.shape[1]


def test_take():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was duplicate

idx_numpy = np.ravel_multi_index(indices_2d, (LARGE_X, SMALL_Y))
assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just talked to Lin and we agreed on merging the end to end tests for now. The condition is that they have to be replaced by 15th of July or the next minor release that includes large tensor support; whichever comes first.

@@ -254,8 +247,8 @@ def numpy_space_to_depth(x, blocksize):


def test_diag():
h = np.random.randint(2,9)
w = np.random.randint(2,9)
h = np.random.randint(2, 9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the necessity of creating h and w variables here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to move forward with them, please add the with seed annotation to the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apeforest Thats how the unittest for diagonal operator is implemented.
@marcoabreu Good point will add @with_seed()

r = mx.nd.diag(a, k=k)
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def test_ravel_multi_index():
indices_2d = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we generate the random indices instead?



def test_unravel_index():
original_2d_indices = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. can we generate random indices? you can pass in the 2d_array as input so you only keep one copy

Copy link
Contributor Author

@access2rohit access2rohit May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input is not 2d_array, we pass indices as input. I can do following:
low_large_value = 2**32
high_large_value = 2**34
a = nd.random.randint(low_large_value, high_large_value, dtype=np.int64)
for random Large indices
Good Point

@@ -216,8 +209,8 @@ def test_where():

def test_pick():
a = mx.nd.ones(shape=(256*35, 1024*1024))
b = mx.nd.ones(shape=(256*35,))
res = mx.nd.pick(a,b)
b = mx.nd.ones(shape=(256*35, ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space around * operator

r = mx.nd.diag(a, k=k)
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


def random_2d_coordinates(x_low, x_high, y_low, y_high):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is different and we need one value that to very large for tests case to be successful not any values between (1, very large value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move this to test_utils.py

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apeforest apeforest merged commit 75a90d0 into apache:master Jun 1, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants